skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Wang, Lan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The data-driven newsvendor problem with features has recently emerged as a significant area of research, driven by the proliferation of data across various sectors such as retail, supply chains, e-commerce, and healthcare. Given the sensitive nature of customer or organizational data often used in feature-based analysis, it is crucial to ensure individual privacy to uphold trust and confidence. Despite its importance, privacy preservation in the context of inventory planning remains unexplored. A key challenge is the nonsmoothness of the newsvendor loss function, which sets it apart from existing work on privacy-preserving algorithms in other settings. This paper introduces a novel approach to estimating a privacy-preserving optimal inventory policy within the f-differential privacy framework, an extension of the classical [Formula: see text]-differential privacy with several appealing properties. We develop a clipped noisy gradient descent algorithm based on convolution smoothing for optimal inventory estimation to simultaneously address three main challenges: (i) unknown demand distribution and nonsmooth loss function, (ii) provable privacy guarantees for individual-level data, and (iii) desirable statistical precision. We derive finite-sample high-probability bounds for optimal policy parameter estimation and regret analysis. By leveraging the structure of the newsvendor problem, we attain a faster excess population risk bound compared with that obtained from an indiscriminate application of existing results for general nonsmooth convex loss. Our bound aligns with that for strongly convex and smooth loss function. Our numerical experiments demonstrate that the proposed new method can achieve desirable privacy protection with a marginal increase in cost. This paper was accepted by J. George Shanthikumar, data science. Funding: This work was supported by the National Science Foundation [Grants DMS-2113409 and DMS 2401268 to W.-X. Zhou, and FRGMS-1952373 to L. Wang]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.01268 . 
    more » « less
  2. The vast majority of literature on evaluating the significance of a treatment effect based on observational data has been confined to discrete treatments. These methods are not applicable to drawing inference for a continuous treatment, which arises in many important applications. To adjust for confounders when evaluating a continuous treatment, existing inference methods often rely on discretizing the treatment or using (possibly misspecified) parametric models for the effect curve. Recently, Kennedy et al. (J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 (2017) 1229–1245) proposed nonparametric doubly robust estimation for a continuous treatment effect in observational studies. However, inference for the continuous treatment effect is a harder problem. To the best of our knowledge, a completely nonparametric doubly robust approach for inference in this setting is not yet available. We develop such a nonparametric doubly robust procedure in this paper for making inference on the continuous treatment effect curve. Using empirical process techniques for local U- and V-processes, we establish the test statistic’s asymptotic distribution. Furthermore, we propose a wild bootstrap procedure for implementing the test in practice. In addition, we define a version of the test procedure based on sample splitting. We illustrate the new method(s) via simulations and a study of a constructed dataset relating the effect of nurse staffing hours on hospital performance. We implement our doubly robust dose response test in the R package DRDRtest on CRAN. 
    more » « less
  3. The National Alzheimer's Coordinating Center Uniform Data Set includes test results from a battery of cognitive exams. Motivated by the need to model the cognitive ability of low‐performing patients we create a composite score from ten tests and propose to model this score using a partially linear quantile regression model for longitudinal studies with non‐ignorable dropouts. Quantile regression allows for modeling non‐central tendencies. The partially linear model accommodates nonlinear relationships between some of the covariates and cognitive ability. The data set includes patients that leave the study prior to the conclusion. Ignoring such dropouts will result in biased estimates if the probability of dropout depends on the response. To handle this challenge, we propose a weighted quantile regression estimator where the weights are inversely proportional to the estimated probability a subject remains in the study. We prove that this weighted estimator is a consistent and efficient estimator of both linear and nonlinear effects. 
    more » « less
  4. Regularized quantile regression (QR) is a useful technique for analyzing heterogeneous data under potentially heavy-tailed error contamination in high dimensions. This paper provides a new analysis of the estimation/prediction error bounds of the global solution of$$L_1$$-regularized QR (QR-LASSO) and the local solutions of nonconvex regularized QR (QR-NCP) when the number of covariates is greater than the sample size. Our results build upon and significantly generalize the earlier work in the literature. For certain heavy-tailed error distributions and a general class of design matrices, the least-squares-based LASSO cannot achieve the near-oracle rate derived under the normality assumption no matter the choice of the tuning parameter. In contrast, we establish that QR-LASSO achieves the near-oracle estimation error rate for a broad class of models under conditions weaker than those in the literature. For QR-NCP, we establish the novel results that all local optima within a feasible region have desirable estimation accuracy. Our analysis applies to not just the hard sparsity setting commonly used in the literature, but also to the soft sparsity setting which permits many small coefficients. Our approach relies on a unified characterization of the global/local solutions of regularized QR via subgradients using a generalized Karush–Kuhn–Tucker condition. The theory of the paper establishes a key property of the subdifferential of the quantile loss function in high dimensions, which is of independent interest for analyzing other high-dimensional nonsmooth problems. 
    more » « less
  5. Distributed dataset synchronization, or Sync in short, plays the role of a transport service in the Named Data Networking (NDN) architecture. A number of NDN Sync protocols have been developed over the last decade. In this paper, we conduct a systematic examination of NDN Sync protocol designs, identify common design patterns, reveal insights behind different design approaches, and collect lessons learned over the years. We show that (i) each Sync protocol can be characterized by its design decisions on three basic components - dataset namespace representation, namespace encoding for sharing, and change notification mechanism, and (ii) two or three types of choices have been observed for each design component. Through analysis and experimental evaluation, we reveal how different design choices influence the latency, reliability, overhead, and security of dataset synchronization. We also discuss the relationship between transport and application naming, the implications of namespace encoding for Sync group scalability, and the fundamental reason behind the need for Sync Interest multicast. 
    more » « less